Feature selection methods for optimizing clinicopathologic input variables in oral cancer prognosis.
نویسندگان
چکیده
The incidence of oral cancer is high for those of Indian ethnic origin in Malaysia. Various clinical and pathological data are usually used in oral cancer prognosis. However, due to time, cost and tissue limitations, the number of prognosis variables need to be reduced. In this research, we demonstrated the use of feature selection methods to select a subset of variables that is highly predictive of oral cancer prognosis. The objective is to reduce the number of input variables, thus to identify the key clinicopathologic (input) variables of oral cancer prognosis based on the data collected in the Malaysian scenario. Two feature selection methods, genetic algorithm (wrapper approach) and Pearson's correlation coefficient (filter approach) were implemented and compared with single-input models and a full-input model. The results showed that the reduced models with feature selection method are able to produce more accurate prognosis results than the full-input model and single-input model, with the Pearson's correlation coefficient achieving the most promising results.
منابع مشابه
Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem
Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...
متن کاملA New Hybrid Feature Subset Selection Algorithm for the Analysis of Ovarian Cancer Data Using Laser Mass Spectrum
Introduction: Amajor problem in the treatment of cancer is the lack of an appropriate method for the early diagnosis of the disease. The chemical reaction within an organ may be reflected in the form of proteomic patterns in the serum, sputum, or urine. Laser mass spectrometry is a valuable tool for extracting the proteomic patterns from biological samples. A major challenge in extracting such ...
متن کاملFeature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets
Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...
متن کاملAnomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors
Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...
متن کاملDiagnosis of Breast Cancer Subtypes using the Selection of Effective Genes from Microarray Data
Introduction: Early diagnosis of breast cancer and the identification of effective genes are important issues in the treatment and survival of the patients. Gene expression data obtained using DNA microarray in combination with machine learning algorithms can provide new and intelligent methods for diagnosis of breast cancer. Methods: Data on the expression of 9216 genes from 84 patients across...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Asian Pacific journal of cancer prevention : APJCP
دوره 12 10 شماره
صفحات -
تاریخ انتشار 2011